MG7: A fast horizontally scalable tool based on cloud computing and graph databases for microbial community profiling
نویسندگان
چکیده
Methods: MG7 i s a n ope n s ource t ool i mplemented i n J ava a nd Scala, ba sed on c loud c omputing ( Amazon W eb S ervices). The g raph da ta platform B io4j ( www.bio4j.com) i s us ed f or r etrieving t axonomy r elated information, w hile N ispero ( http://ohnosequences.com/nispero) i s used f or distributing and coordinating compute tasks. Results: MG7 i s a n ope n-source, f ast a nd hor izontally s calable t ool f or community pr ofiling ba sed on t he a nalysis of 16S m etagenomics da ta. I t i s entirely c loud-based an d s pecifically d esigned t o t ake ad vantage o f i t: i t performs the community profiling of a sample starting from raw Illumina reads in a pproximately 1 ho ur, needing a pproximately t he s ame t ime for d oing t he same on hundreds of samples, adjusting automatically the computation capacity to the resources needed in each project. The taxonomic assignment can be done using a Best BLAST hit paradigm or a Lowest Common ancestor Paradigm; the user can choose between both assignment algorithms and setting the similarity parameters required for the assignment. As an output, MG7 generates the frequencies of all the identified taxa in any of the s amples i n t ab-separated value t ext f iles as well as i n t he s tandard B IOM format c ompliant w ith o ther m etagenomics to ols. T his o utput in cludes d irect assignment frequencies an d cu mulative f requencies b ased o n t he h ierarchical structure of t he t axonomy t ree. It a lso pr ovides w ith out put f iles s uitable f or generating heat-map representations. MG7 is an open-source tool available under the AGPLv3 license This project is funded in part by the ITN FP7 project INTERCROSSING (Grant 289974) a nd t he S panish C DTI ( Centro pa ra e l Desarrollo T ecnológico Industrial) grant NEXTMICRO, ref. IDI-20120242.
منابع مشابه
MG7: Configurable and scalable 16S metagenomics data analysis
As part of the Cambrian explosion of omics data, metagenomics brings to the table a specific, defining trait: its social essence. The meta prefix exerts its influence, with multitudes manifesting themselves everywhere; from samples to data analysis, from actors involved to (present and future) applications. Of these dimensions, data analysis is where needs lay further from what current tools pr...
متن کاملData Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملAn Optimal Utilization of Cloud Resources using Adaptive Back Propagation Neural Network and Multi-Level Priority Queue Scheduling
With the innovation of cloud computing industry lots of services were provided based on different deployment criteria. Nowadays everyone tries to remain connected and demand maximum utilization of resources with minimum timeand effort. Thus, making it an important challenge in cloud computing for optimum utilization of resources. To overcome this issue, many techniques have been proposed ...
متن کاملA Mobile and Fog-based Computing Method to Execute Smart Device Applications in a Secure Environment
With the rapid growth of smart device and Internet of things applications, the volume of communication and data in networks have increased. Due to the network lag and massive demands, centralized and traditional cloud computing architecture are not accountable to the high users' demands and not proper for execution of delay-sensitive and real time applications. To resolve these challenges, we p...
متن کاملAn Effective Task Scheduling Framework for Cloud Computing using NSGA-II
Cloud computing is a model for convenient on-demand user’s access to changeable and configurable computing resources such as networks, servers, storage, applications, and services with minimal management of resources and service provider interaction. Task scheduling is regarded as a fundamental issue in cloud computing which aims at distributing the load on the different resources of a distribu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014